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1. INTRODUCTION 

The power of low-density parity-check (LDPC) codes based on the finite high-order field GF(q) has 
long been recognized. For short and medium codeword lengths, these codes increase binary LDPC 
performance. Non-binary LDPC (NB-LDPC) codes have been demonstrated to outperform turbo 
convolutional codes (TCC) and binary LDPC codes, retaining the advantages of steep drop zone and low 
error of short codewords (typical TCC) (typical binary LDPC) [1]-[3]. This gain, however, comes at the 
expense of greater decoding difficulty. Indeed, as q rises, the complexity of the decoder rises, limiting design 
options and encouraging the search for simpler decoding algorithms. 

In recent years, several efforts have been made to reducing the convolution of NB-LDPC decoders, 
and divers associated architecture algorithms have been proposed. [1], [4]-[6] have proposed an LDPC code 
decoding algorithm with reduced complexity. This algorithm is known as the group shuffled belief 
propagation (GSBP) extended minimum sum algorithm, and it is based on the minimum sum (MS) 
algorithm's generalization. To decrease the computational complexity of updating the control node, the 
approach involves using a restricted number of nm-reliabilities in the message at the control node's input 


[7]-19]. 
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A new GSBP decoder implementation has been suggested. This technique is unique in that it solves 
the memory problem of non-binary LDPC decoders while drastically decreasing the complexity per iteration. 
The new GSBP decoder's major feature is that it extends the truncation concept of vector messages to control 
and data node inputs. To decrease the impact of messages on code speed, the authors effectively shortened 
messages from q to nm. They also have a good offset adjustment to make up for the performance loss.The 
GSBP decoder's complexity is now theoretically dominated by O (nm. log nm), with nm << q, which is a 
significant decrease in complexity over all previous techniques [10]—[15]. 

Our research is based on the GSBP algorithm [10], [16]-[20] This decreases the number of GF 
elements that must be processed in each decoding phase from q to nm, nm and It; q, while providing 
excellent functional performance. We convert the GSBP algorithm into a low latency, prefetching elementary 
NC enquiry control number (ECN) that relaxes redundancy control at low error rates while sacrificing only a 
tiny amount of functional performance. A new method is presented to bring variable node (VN) latency 
closer to that of check node (CN), in fact this design is based on the significant reduction of combinatorial 
optimization search and counting time. To achieve the highest possible pipeline efficiency, we suggest a 
conflict-free memory to handle the data dependencies caused by unstructured codes. A full (2, 4) regular, 
(960, 80) GF (6) LDPC decoder is prototyped on a Cyclone IV EP CE115F29C8 field programmable gate 
array (FPGA) embedded on the Altera DE2115 board. The decoder achieves good error correction 
performance. 

We presented this article at the following section 2 presents the symbol model, LDPC codes over 
GF(q). Section 3 presents the implementation LDPC: design architecture of GSBP. Section 4 presents the 
FPGA results and prototype, and we conclude in section 5. 


2. SYMBOL MODEL, LDPC CODES OVER GF(q) 

Linear block codes, often known as LDPC codes, are a kind of linear block coding. Non-binary 
LDPC codes over GF(q) are considered binary LDPC codes over GF(q) if q = 2 (2). A vector space 
projected on GF (q) is used to define the elements in GF (q). Choose q as: 


JET (1) 


The code is given and its value r is supplied as fallows using the ultra-sparse parity check matrix H, where m 
is a positive integer and m > 1. 


R=(N-M)/N (2) 


M = N —K, where N denotes codeword length and K denotes message length. A column weight of at least 

two and a row weight that is as uniform as feasible are used to construct the sparse parity check matrix H of 

size (M x N). This building method ensures: 

— Every column has a certain number y of items 

— Every line has a number of elements p 

— Non-zero elements occur in either row or column at more than one position in any two rows or columns. 
According to the first two, H has a constant row and column weight and forms typical LDPC code. 

The third asserts that the entire graph of code is devoid of a four-cycle cycle. First permutation matrices are 

produced utilizing non binary components in GF (q) and dispersed in the base matrix to build parity check 

matrix H in the creation of LDPC codes. In the sparse parity check matrix H, an LDPC is a linear block code 

with a low density of none zero entries. Tanner graphs, also known as bipartite graphs, are used to depict 

these codes, as shown in Figure 1. 


2.1. System model 

Figure 2 shows an NB-LDPC coded modulation system. The LDPC encoder is used to encode the k 
information bits in the input message vector u GF(q) into a codeword, resulting in an encoded message 
vector v of length N coded bits. The equation gives the rate of the LDPC encoder (2). These bits are 
modulated into QPSK symbols (represented by the letter X) and then delivered across the additive white 
Gaussian noise (AWGN) channel. The received signal r is written as: 


r=Xx +n (3) 


where n denotes AWGN noise and is modulated by a zero mean Gaussian. The following is a random 
sequence with variance: 
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where Eb/NO is SNR. 
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Figure 1. The tanner graph over GF and the parity check matrix H (8) 
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Figure 2. Encoder global block 


3. IMPLEMENTATION LDPC: DESIGN ARCHITECTURE OF GSBP 
3.1. Encoding 

An M x N parity check matrix H defines non-binary LDPC codes, components that are defined over 
a finite field GF(q). The parity check matrix can take on q — 1 values for each non-zero member, resulting in 
a parity check matrix. Because the H is not in a logical order at first, we write it as: 


H =[P|Im] (5) 


where Jm denotes the MxM identity matrix and P denotes the MxK dimension matrix, with K = N — M. 
The GF is used to do all arithmetic operations (q). With dimensions K x N, a generator matrix can be created: 


G = [Ik|P’] (6) 


The encoder turns a message frame U from GF(q) containing K information bits into a codeword V of length 
N (N symbols). 
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The matrix multiplication is done over the finite field GF in this case (q). Finally, the encoder block sends us 
the codeword V, which has a length of N = nP bits. For the multiplication we use the efficient digit serial 
Karatsuba multiplication method and it uses (3dm/2) AND gate,6 m+n+ (3dm/2)+m2 +d-7 XORs and 3 m-3 
registers as illustrated in Figure 3. 


Figure 3. Efficient digit-serial Karatsuba multiplication 


3.2. Top level architecture of decoder 

When it comes to decoding non-binary codes, there are two issues to consider: The first issue is the 
iterative processing of control and variable nodes, and the second issue is identifying the most likely 
codeword v that meets the criteria v H = 0, with a probability of v determined using the channel model. The 
Figure 4 shows the top-level architecture design. 

The GSBP algorithm is an extension of the MinSum algorithm from binary codes to NBLDPC 
codes. Vectors of likelihood ratio values are transferred between the VN and CN processors in the messages 
log-likelihood ratios (LLRs). The extended min sum GSBP algorithm has recently been proposed for non- 
binary LDPC decoding. This novel technique can reduce the number of comparison operations by a factor of 
3, resulting in a lower hardware complexity, without introducing any significant performance degradation. A 
message in the GSBP algorithm is a vector of q sub-messages. Let xj be the code symbol for the code word's 
j-th character. 

Let Aj = [Aj(0),Aj(1),...,Aj (q — 1)] be the a priori channel information for xj. The 
sub-message 4j is a log-likelihood ratio (LLR) defined as Aj (d) = log (Prob (xj = zj)/Prob (xj = d)), 
where zj is the most likely (ML) symbol for xj. We denote ai,j and fi,j as the V2C and C2V soft messages 
passed between the i-th CN and j-th VN respectively. 


Let xi,j = hi,j ® xi. 


For the i-th CN, let the configuration Li (xi, j = d) be the sequence such that xi, j=d and the i-th 
check-sum is satisfied. Define an s-truncated configuration such for each j E Ni\j, æi, j (xi,j) is of the s 
smallest sub-messages of ai,j(d) over all E GF(q). Let Li(xi,j = d|s) be the set of the s-truncated 
configurations. Taking s < q necessitates extra procedures known as message truncation, which involves 
sorting the sub-messages and ignoring the q — s biggest ones. Let x and «max denote the iteration counter 
and the maximum number of iterations respectively. 
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Figure 4. Top-level architecture design 


3.3. Definition of NB LLR values 

The first step in the Min-Sum technique is to calculate the LLR value for each symbol in the 
codeword. With the premise that the GF(q) symbols are equiprobable [21] yields the LLR value Lk (x) of the 
k-th symbol: 


Lk(x) = In [P (yk['xk)/ P (yk|x)] (8) 


where “xk is the symbol of GF(q) that maximizes P (yk |x), i.e. “xk = arg maxx E€ GF(q) ,{P (yk |x)} and 
yk is the received symbol. Note that Lk (xk) = 0 and, for all x E€ GF (q),Lk (x) = 0. When a result, as a 
symbol's LLR grows, its dependability diminishes. When addressing the finite precision representation of the 
LLR values, this LLR formulation eliminates the requirement to re-normalize the messages after each node 
update calculation and reduces the effect of quantization. The Figure 5 shows a block diagram of the LLR 
computation and the Figure 6 shows a timing diagram of the LLR computation over GF (16). Nm = 10. 


Yo Y, Yc-ı Ym-1 


L(C — 1) 
LCC) 


Merging 


Expansion Memorization 


Figure 5. Block diagram of the LLR computation 
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Figure 6. Timing diagram of the LLR computation over GF (16) Nm = 10 


3.4. Check node architecture 

The check node processor (CNP) may be constructed using either the FB architecture or a tree-based 
structure [22]. The tree structure has the advantage of reducing the number of ECN in the critical route to a 
bare minimum and ensuring consistency across all outputs. We considered the Tree structure in our study for 
these reasons. 

The symbols of the messages entering the CNP must be multiplied by the non-zero members of the 
parity check matrix as illustrated in Figure 7. In addition to the CNP's output messages that are split by these 
non-zero elements (the row corresponding to the CNP in the Tanner graph). As a result, the implemented 
CNP architecture performs multiplications on GF(q) using the hardwired multipliers described in [23]—[25]. 


A; 


Figure 7. Design of the CN block 


3.5. Variable node architecture 

We start the VN operations immediately after the CN operation, a transfer of messages from C to V 
in the memory. The message C to V is computed from the messages C to V and the previous message V to C 
taking into account the GF indices that must match. To reduce latency, we allocate only L(svn), the length of 
the sorter used in VN, to scan the vector L. Due to a lack of space, we just name the sorter in the memory. 
Due to space constraints, we will only provide the algorithm and leave the rest of the discussion for a 
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subsequent publication. The Figure 8 shows architecture of the VN block and the Figure 9 shows architecture 
of the sorter block in VN. 
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Figure 9. Architecture of the sorter block in VN 


4. FPGA RESULTS AND PROTOTYPE 

On a Cyclone IV FPGA, the suggested decoder architecture was prototyped for a (2, 4)-regular (960, 
480) NB-LDPC code over GF (64). The FPGA decoder has 10 decoding cycles and runs at 100 MHz with a 
code rate of 2.44 Mbps. Based on the suggested design of this FPGA device, we were able to map a semi 
complete allied decoder. This result is a substantial improvement over the current GSBP decoder 
implementation. The layer decoding improves convergence and replicate show that the average number of 
iterations for FER=105 decreases from 2.39 to 1.77. The Table 1 gives an overview of how they look time 
required for input and output. The Figure 10 illustrate the decoding output and the Figure 11 illustrate the 
performances of NB-LDPC over GF (64) using the GSBP algorithm. 


Table 1. Summary of the timing 


Delay 8.487 ns 

Maximum frequency 117.830 MHz 
Minimum input arrival time before clock 5.093 ns 
After the clock, the maximum output necessary 12.119ns 
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Figure 10. Decoding output 
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Figure 11. Performances of NB-LDPC over GF (64) using the GSBP algorithm 


5. CONCLUSION 

A hardware simplified decoding algorithm using the simplified extended min sum algorithm was 
favorably presented. The concept was centered on considerably decreasing the combinatorial optimization 
search space as well as the computing time. To minimize latency and enable effective pipeline scheduling, 
VN and CN designs based on skimming, prefetching, and easing redundancy control are proposed. To 
minimize pipeline delays, a conflict-free memory was utilized to handle data risks. The results suggest that 
the decoding is working well. Compared to the GSBP method for NB-LDPC codes, the design has 
considerably reduced computational complexity and memory use, according to our research. 
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